GES DISC - MERRA2

Reading MERRA2 Data Using Kerchunk Reference File

Many of NASA’s current and legacy data collections are archive in netCDF4 format. By itself, netCDF4 are not cloud optimized and reading these files can take as long from a personal/local work environment as it takes to read the data from a working environment deployed in the cloud. Using Kerchunk, we can treat these files as cloud optimized assets by creating metadata json file describing existing netCDF4 files, their chunks, and where to access them. The json reference files can be read in using Zarr and Xarray for efficient reads and fast processing.

Requirements

1. AWS instance running in us-west-2

NASA Earthdata Cloud data in S3 can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region.

2. Earthdata Login

An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

3. netrc File

You will need a netrc file containing your NASA Earthdata Login credentials in order to execute the notebooks. A netrc file can be created manually within text editor and saved to your home directory. For additional information see: Authentication for NASA Earthdata.

Import required packages

import requests
import xarray as xr
import ujson
import s3fs
import fsspec
from tqdm import tqdm
from glob import glob
import os
import pathlib
import hvplot.xarray

from kerchunk.hdf import SingleHdf5ToZarr
from kerchunk.combine import MultiZarrToZarr

# The xarray produced from the reference file throws a SerializationWarning for each variable. Will need to explore why
import warnings
warnings.simplefilter("ignore")

Create Dask client to process the output json file in parallel

Generating the Kerchunk reference file can take some time depending on the internal structure of the data. Dask allows us to execute the reference file generation process in parallel, thus speeding up the overall process.

import dask
from dask.distributed import Client
client = Client(n_workers=4)
client

Client

Client-3e6c0be3-d18c-11ec-809e-527eee20f3f0

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Worker: 1

Comm: tcp://127.0.0.1:32859 Total threads: 1
Dashboard: http://127.0.0.1:36075/status Memory: 1.87 GiB
Nanny: tcp://127.0.0.1:35291
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-piqb4cet

Worker: 2

Comm: tcp://127.0.0.1:36601 Total threads: 1
Dashboard: http://127.0.0.1:45185/status Memory: 1.87 GiB
Nanny: tcp://127.0.0.1:41163
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-d_91r19y

Worker: 3

Comm: tcp://127.0.0.1:34309 Total threads: 1
Dashboard: http://127.0.0.1:38301/status Memory: 1.87 GiB
Nanny: tcp://127.0.0.1:44119
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-9wa5m1vc

Get temporary S3 credentials

Temporary S3 credentials need to be passed to AWS. Note, these credentials must be refreshed after 1 hour.

s3_cred_endpoint = {
    'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
    'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials',
    'ornldaac':'https://data.ornldaac.earthdata.nasa.gov/s3credentials',
    'gesdisc':'https://data.gesdisc.earthdata.nasa.gov/s3credentials'
}
def get_temp_creds():
    temp_creds_url = s3_cred_endpoint['gesdisc']
    return requests.get(temp_creds_url).json()
temp_creds_req = get_temp_creds()

Direct Access a single netCDF4 file

Pass temporary credentials to our filesystem object to access the S3 assets

fs = s3fs.S3FileSystem(
    anon=False,
    key=temp_creds_req['accessKeyId'],
    secret=temp_creds_req['secretAccessKey'],
    token=temp_creds_req['sessionToken']
)
url = 's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4'
s3_file_obj = fs.open(url, mode='rb')

Time how long it takes to directly access a cloud asset for comparisons later.

%%time

xr_ds = xr.open_dataset(s3_file_obj, chunks='auto', engine='h5netcdf')
xr_ds
CPU times: user 2.9 s, sys: 228 ms, total: 3.13 s
Wall time: 7.53 s
<xarray.Dataset>
Dimensions:   (lon: 576, lat: 361, time: 24)
Coordinates:
  * lon       (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
  * lat       (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
  * time      (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-01T23:30:00
Data variables: (12/47)
    CLDPRS    (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    CLDTMP    (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    DISPH     (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    H1000     (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    H250      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    H500      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    ...        ...
    V250      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    V2M       (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    V500      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    V50M      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    V850      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
    ZLCL      (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray>
Attributes: (12/30)
    History:                           Original file generated: Sat May 11 22...
    Comment:                           GMAO filename: d5124_m2_jan10.tavg1_2d...
    Filename:                          MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    Conventions:                       CF-1
    Institution:                       NASA Global Modeling and Assimilation ...
    References:                        http://gmao.gsfc.nasa.gov
    ...                                ...
    Contact:                           http://gmao.gsfc.nasa.gov
    identifier_product_doi:            10.5067/VJAFPLI1CSIV
    RangeBeginningDate:                2019-05-01
    RangeBeginningTime:                00:00:00.000000
    RangeEndingDate:                   2019-05-01
    RangeEndingTime:                   23:59:59.000000
  • History :
    Original file generated: Sat May 11 22:08:52 2019 GMT
    Comment :
    GMAO filename: d5124_m2_jan10.tavg1_2d_slv_Nx.20190501.nc4
    Filename :
    MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    Conventions :
    CF-1
    Institution :
    NASA Global Modeling and Assimilation Office
    References :
    http://gmao.gsfc.nasa.gov
    Format :
    NetCDF-4/HDF-5
    SpatialCoverage :
    global
    VersionID :
    5.12.4
    TemporalRange :
    1980-01-01 -> 2016-12-31
    identifier_product_doi_authority :
    http://dx.doi.org/
    ShortName :
    M2T1NXSLV
    GranuleID :
    MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    ProductionDateTime :
    Original file generated: Sat May 11 22:08:52 2019 GMT
    LongName :
    MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
    Title :
    MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
    SouthernmostLatitude :
    -90.0
    NorthernmostLatitude :
    90.0
    WesternmostLongitude :
    -180.0
    EasternmostLongitude :
    179.375
    LatitudeResolution :
    0.5
    LongitudeResolution :
    0.625
    DataResolution :
    0.5 x 0.625
    Source :
    CVS tag: GEOSadas-5_12_4_p16_sp3_M2-OPS
    Contact :
    http://gmao.gsfc.nasa.gov
    identifier_product_doi :
    10.5067/VJAFPLI1CSIV
    RangeBeginningDate :
    2019-05-01
    RangeBeginningTime :
    00:00:00.000000
    RangeEndingDate :
    2019-05-01
    RangeEndingTime :
    23:59:59.000000
  • Specify a list of S3 URLs

    Data Collection: MERRA2_400.tavg1_2d_slv_Nx
    Time Range: 05/01/2019 - 05/31/2019

    urls = ['s3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190502.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190503.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190504.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190505.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190506.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190507.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190508.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190509.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190510.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190511.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190512.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190513.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190514.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190515.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190516.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190517.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190518.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190519.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190520.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190521.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190522.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190523.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190524.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190525.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190526.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190527.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190528.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190529.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190530.nc4',
    's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190531.nc4']

    Generate the Kerchunk reference files.

    Define a function to generate the Kerchunk reference files. These files can take a little time to generate.

    def gen_json(u):
        so = dict(
            mode= "rb", 
            anon= False, 
            default_fill_cache= False,
            default_cache_type= "none"
        )
        with fs.open(u, **so) as infile:
            h5chunks = SingleHdf5ToZarr(infile, u, inline_threshold=300)
            with open(f"jsons/{u.split('/')[-1]}.json", 'wb') as outf:
                outf.write(ujson.dumps(h5chunks.translate()).encode())

    Create output jsons directory if one does not exist.

    pathlib.Path('./jsons/').mkdir(exist_ok=True)

    Use the Dask Delayed function to create the Kerchunk reference file for each URL from the list of URLs in parallel

    %%time
    
    reference_files = []
    for url in urls:
        ref = dask.delayed(gen_json)(url)
        reference_files.append(ref)
    
    reference_files_compute = dask.compute(*reference_files)
    CPU times: user 29 s, sys: 11.1 s, total: 40 s
    Wall time: 11min 6s

    Create a python list with the paths to the reference files.

    reference_list = sorted(glob('./jsons/*.json'))
    reference_list
    ['./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190502.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190503.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190504.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190505.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190506.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190507.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190508.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190509.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190510.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190511.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190512.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190513.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190514.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190515.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190516.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190517.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190518.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190519.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190520.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190521.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190522.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190523.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190524.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190525.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190526.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190527.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190528.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190529.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190530.nc4.json',
     './jsons/MERRA2_400.tavg1_2d_slv_Nx.20190531.nc4.json']

    Read single netCDF4 using Kerchunk reference file

    Open the first reference file to read into an xarray dataset

    with open(reference_list[0]) as j:
        reference = ujson.load(j)

    Set configurations options

    s_opts = {'skip_instance_cache':True}   #json
    r_opts = {'anon':False,          
              'key':temp_creds_req['accessKeyId'], 
              'secret':temp_creds_req['secretAccessKey'], 
              'token':temp_creds_req['sessionToken']}    #ncfiles
    fs_single = fsspec.filesystem("reference",
                                  fo=reference,
                                  ref_storage_args=s_opts,
                                  remote_protocol='s3', 
                                  remote_options=r_opts)

    Read in a single reference object. We get a lot of SerializationWarnings which are ignored here using the warning package.
    NOTE, the fill value, data range, min value, and max value DO NOT match the source file. Will need to look into this more.

    %%time
    
    m = fs_single.get_mapper("")
    ds_single = xr.open_dataset(m, engine="zarr", backend_kwargs={'consolidated':False}, chunks={})
    ds_single
    CPU times: user 142 ms, sys: 3.29 ms, total: 146 ms
    Wall time: 354 ms
    <xarray.Dataset>
    Dimensions:   (time: 24, lat: 361, lon: 576)
    Coordinates:
      * lat       (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
      * lon       (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
      * time      (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-01T23:30:00
    Data variables: (12/47)
        CLDPRS    (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        CLDTMP    (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        DISPH     (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        H1000     (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        H250      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        H500      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        ...        ...
        V250      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V2M       (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V500      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V50M      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V850      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        ZLCL      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
    Attributes: (12/30)
        Comment:                           GMAO filename: d5124_m2_jan10.tavg1_2d...
        Contact:                           http://gmao.gsfc.nasa.gov
        Conventions:                       CF-1
        DataResolution:                    0.5 x 0.625
        EasternmostLongitude:              179.375
        Filename:                          MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
        ...                                ...
        TemporalRange:                     1980-01-01 -> 2016-12-31
        Title:                             MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Ti...
        VersionID:                         5.12.4
        WesternmostLongitude:              -180.0
        identifier_product_doi:            10.5067/VJAFPLI1CSIV
        identifier_product_doi_authority:  http://dx.doi.org/
  • Comment :
    GMAO filename: d5124_m2_jan10.tavg1_2d_slv_Nx.20190501.nc4
    Contact :
    http://gmao.gsfc.nasa.gov
    Conventions :
    CF-1
    DataResolution :
    0.5 x 0.625
    EasternmostLongitude :
    179.375
    Filename :
    MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    Format :
    NetCDF-4/HDF-5
    GranuleID :
    MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    History :
    Original file generated: Sat May 11 22:08:52 2019 GMT
    Institution :
    NASA Global Modeling and Assimilation Office
    LatitudeResolution :
    0.5
    LongName :
    MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
    LongitudeResolution :
    0.625
    NorthernmostLatitude :
    90.0
    ProductionDateTime :
    Original file generated: Sat May 11 22:08:52 2019 GMT
    RangeBeginningDate :
    2019-05-01
    RangeBeginningTime :
    00:00:00.000000
    RangeEndingDate :
    2019-05-01
    RangeEndingTime :
    23:59:59.000000
    References :
    http://gmao.gsfc.nasa.gov
    ShortName :
    M2T1NXSLV
    Source :
    CVS tag: GEOSadas-5_12_4_p16_sp3_M2-OPS
    SouthernmostLatitude :
    -90.0
    SpatialCoverage :
    global
    TemporalRange :
    1980-01-01 -> 2016-12-31
    Title :
    MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
    VersionID :
    5.12.4
    WesternmostLongitude :
    -180.0
    identifier_product_doi :
    10.5067/VJAFPLI1CSIV
    identifier_product_doi_authority :
    http://dx.doi.org/
  • Read multiple netCDF4 files using Kerchunk reference file

    Combine the individual reference files into a single time series reference object

    %%time
    
    ds_k =[]
    for ref in reference_list:
        s_opts = s_opts
        r_opts = r_opts
        fs = fsspec.filesystem("reference",
                               fo=ref,
                               ref_storage_args=s_opts,
                               remote_protocol='s3',
                               remote_options=r_opts)
        m = fs.get_mapper("")
        ds_k.append(xr.open_dataset(m, engine="zarr", backend_kwargs={'consolidated':False}, chunks={}))
        
    ds_multi = xr.concat(ds_k, dim='time')
        
    ds_multi
    CPU times: user 8.93 s, sys: 174 ms, total: 9.1 s
    Wall time: 14.9 s
    <xarray.Dataset>
    Dimensions:   (time: 744, lat: 361, lon: 576)
    Coordinates:
      * lat       (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0
      * lon       (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
      * time      (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-31T23:30:00
    Data variables: (12/47)
        CLDPRS    (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        CLDTMP    (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        DISPH     (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        H1000     (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        H250      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        H500      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        ...        ...
        V250      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V2M       (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V500      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V50M      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        V850      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
        ZLCL      (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray>
    Attributes: (12/30)
        Comment:                           GMAO filename: d5124_m2_jan10.tavg1_2d...
        Contact:                           http://gmao.gsfc.nasa.gov
        Conventions:                       CF-1
        DataResolution:                    0.5 x 0.625
        EasternmostLongitude:              179.375
        Filename:                          MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
        ...                                ...
        TemporalRange:                     1980-01-01 -> 2016-12-31
        Title:                             MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Ti...
        VersionID:                         5.12.4
        WesternmostLongitude:              -180.0
        identifier_product_doi:            10.5067/VJAFPLI1CSIV
        identifier_product_doi_authority:  http://dx.doi.org/
  • Comment :
    GMAO filename: d5124_m2_jan10.tavg1_2d_slv_Nx.20190501.nc4
    Contact :
    http://gmao.gsfc.nasa.gov
    Conventions :
    CF-1
    DataResolution :
    0.5 x 0.625
    EasternmostLongitude :
    179.375
    Filename :
    MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    Format :
    NetCDF-4/HDF-5
    GranuleID :
    MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4
    History :
    Original file generated: Sat May 11 22:08:52 2019 GMT
    Institution :
    NASA Global Modeling and Assimilation Office
    LatitudeResolution :
    0.5
    LongName :
    MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
    LongitudeResolution :
    0.625
    NorthernmostLatitude :
    90.0
    ProductionDateTime :
    Original file generated: Sat May 11 22:08:52 2019 GMT
    RangeBeginningDate :
    2019-05-01
    RangeBeginningTime :
    00:00:00.000000
    RangeEndingDate :
    2019-05-01
    RangeEndingTime :
    23:59:59.000000
    References :
    http://gmao.gsfc.nasa.gov
    ShortName :
    M2T1NXSLV
    Source :
    CVS tag: GEOSadas-5_12_4_p16_sp3_M2-OPS
    SouthernmostLatitude :
    -90.0
    SpatialCoverage :
    global
    TemporalRange :
    1980-01-01 -> 2016-12-31
    Title :
    MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
    VersionID :
    5.12.4
    WesternmostLongitude :
    -180.0
    identifier_product_doi :
    10.5067/VJAFPLI1CSIV
    identifier_product_doi_authority :
    http://dx.doi.org/
  • Agains, the fill value, data range, min value, and max value DO NOT match the source file. TODO: explore why the values are different

    ds_multi['T500']
    <xarray.DataArray 'T500' (time: 744, lat: 361, lon: 576)>
    dask.array<concatenate, shape=(744, 361, 576), dtype=float32, chunksize=(1, 91, 144), chunktype=numpy.ndarray>
    Coordinates:
      * lat      (lat) float64 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0
      * lon      (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4
      * time     (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-31T23:30:00
    Attributes:
        fmissing_value:  999999986991104.0
        long_name:       air_temperature_at_500_hPa
        standard_name:   air_temperature_at_500_hPa
        units:           K
        valid_range:     [-999999986991104.0, 999999986991104.0]
        vmax:            999999986991104.0
        vmin:            -999999986991104.0
    # Commenting for quarto site render
    # ds_multi['T500'].hvplot.image(x='lon', y='lat')

    References